Risk score models for urinary tract infection hospitalization

Annually, urinary tract infections (UTIs) affect over a hundred million people worldwide. Early detection of high-risk individuals can help prevent hospitalization for UTIs, which imposes significant economic and social burden on patients and caregivers. We present two methods to generate risk score models for UTI hospitalization. We utilize a sample of patients from the insurance claims data provided by the Centers for Medicare and Medicaid Services to develop and validate the proposed methods. Our dataset encompasses a wide range of features, such as demographics, medical history, and healthcare utilization of the patients along with provider quality metrics and community-based metrics. The proposed methods scale and round the coefficients of an underlying logistic regression model to create scoring tables. We present computational experiments to evaluate the prediction performance of both models. We also discuss different features of these models with respect to their impact on interpretability. Our findings emphasize the effectiveness of risk score models as practical tools for identifying high-risk patients and provide a quantitative assessment of the significance of various risk factors in UTI hospitalizations such as admission to ICU in the last 3 months, cognitive disorders and low inpatient, outpatient and carrier costs in the last 6 months.


Introduction
Urinary tract infections (UTIs) affect around 150 million people worldwide annually, including 11 million in the United States (US) [1,2].A study in the US estimated that UTIs were associated with over 10.5 million physician visits and 2-3 million emergency department visits in 2007 [3].Furthermore, UTIs account for a substantial number of antibiotic prescriptions [4,5].Effective and timely outpatient care can help reduce the likelihood of hospitalization for UTI, as it is considered an ambulatory care sensitive condition [6].Hospitalizations are significantly more expensive than outpatient or primary care, thus potentially preventable hospitalizations are closely monitored to evaluate the efficiency of health systems [7].UTIs caused about 380,600 potentially preventable adult inpatient stays, costing 2.55 billion dollars in the US in 2017 [8].
Women are more likely to develop UTIs than men, with an estimated 50-60% of women experiencing at least one UTI in their lifetime [9].Women are also more likely to experience recurrent UTIs, which can further increase the risk of hospitalization [10].In addition to sex, other risk factors for UTIs include age, urinary catheterization, urinary tract abnormalities, pregnancy, and history of UTIs [3,4,11].Adults with cognitive impairment [12] and individuals with conditions such as diabetes, and immunosuppression are also at increased risk for UTIs [3,13].UTIs are also prevalent after a kidney transplant with two main triggers being vesicoureteral reflux and the use of immunosuppressive drugs [13].These population groups may require intensive management of UTIs and closer monitoring to prevent hospitalization.

Data
This is a retrospective study using Medicare Limited Data Sets (LDS) from 2008 to 2012.CMS made this fully anonymized data available to our group on 1/16/2020 for an Artificial Intelligence Health Outcomes Challenge to predict hospital and skilled nursing facility admissions and adverse events based on Medicare Fee-for-Service (FFS) Parts A and B administrative claims [28].The data set contains records from 2008-2012 for a random 5% sample of Medicare beneficiaries.Our study does not include minors.We are reporting a retrospective study of archived samples, all data were fully anonymized before we accessed them.The overall project was approved by the Institutional Research Board at North Carolina State University (IRB Protocol 20528) as exempt secondary research for which consent is not required.The data set contains sensitive patient information and cannot be shared publicly due to legal restrictions.Data storage, maintenance, and protection are governed by the Data Use Agreement between the university and the CMS.
To improve data consistency, we exclude beneficiaries who live in areas other than the 53 US states and territories.We also exclude beneficiaries with cost abnormalities including those with carrier claims but zero cost in carrier file, outpatient claims but zero cost in carrier or outpatient files, and those who had less than 2,000 USD total annual cost in carrier, inpatient, outpatient, and SNF claims.Furthermore, to prevent bias and improve generalizability of the model results, we exclude beneficiaries with urinary cancer and end-stage renal disease (ESRD) related claims.We also exclude beneficiaries who live in a nursing home.
We do not consider the first 3 months and the last month of 2011 and 2012 in our analysis to ensure consistent computation of rolling horizon variables, e.g., inpatient cost in the last 3 months.We also filter records of patients who are dead in hospital or hospice at the beginning of the current (observation) month, or who are enrolled in managed care at the beginning of the current month.We use data from April to November of 2011 for training, and data from 2012 during the same period for testing and performance evaluation.
In addition to medical claims, our data set contains relevant variables from publicly available datasets such as Population Census Elderly Living [29]; Immunization [30]; County Health Rankings [31]; Hospital and Nursing Home Compare Dataset [32,33]; Weekly U.S. Influenza Surveillance Reports [34].We also computed 285 Clinical Classification System (CCS) variables based on the ICD-9 diagnosis codes identified by AHRQ and added to our data.An extended description of our data can be found in [18].The final data used for the analysis contains 821 variables including demographic characteristics and medical history of beneficiaries, their healthcare utilization in the past 1-6 months, as well as provider quality metrics, census information, and community-based regional public health metrics (e.g., flu vaccine coverage in an area); (see Table 1).
Observation generation.Each observation (row) corresponds to a patient-month.The number of observations for a patient varies between 1 and 8 in 2011 and 2012.To ensure that the model only employs data from past events to predict future events, the outcome variable for each patient-month is the UTI hospitalization in the next month.We identify UTI hospitalizations based on the PQI 12 criteria (urinary tract infection admission rate) defined by AHRQ [27].More specifically, we considered inpatient admission claims in 2011 and 2012 for patients ages 18 years and older with a principal ICD-9-CM diagnosis of code for UTI: 59010, 59011, 5902, 5903, 59080, 59081, 5909, 5950, 5959, 5990.The PQI-12 definition excludes kidney or urinary tract disorder admissions, other indications of immunocompromised state admissions, obstetric admissions, and transfers from other institutions.It includes cystitis, pyelonephritis, and urinary tract infections.
Our data is imbalanced; UTI hospitalization is observed in less than 0.5% of the beneficiaries.To focus on a subpopulation with relatively high risk for UTI hospitalization, we consider beneficiaries who had at least one claim with a UTI diagnosis since 2008.The UTI diagnosis codes are based on the CCS category 159 (Urinary Tract Infections) [41].We exclude beneficiaries without any inpatient or Skilled Nursing Facility (SNF) claim.This subpopulation covers ~46% of all UTI hospitalizations in our data set, see Table 2.

Methods
Different approaches can be used for assigning points to model features when developing a risk score.In this study, we apply two methods.The first method, referred to as integerized logistic regression (LR), aims to develop a risk score with a given total score range.The second method, referred to as the credit scorecard model, regulates the increment in the total score associated with a specific level of increase in the odds of the outcome.
In the integerized LR method, variables are transformed into binary variables (e.g., six binary variables for each category of age), an LR model is built, and its coefficients are scaled and rounded.Treating each category within a variable as a dummy binary variable results in high collinearity between variables.We address this issue by limiting the number of features to 10 using a LASSO penalty in training [42].In the credit scorecard model, each categorical value of a variable is replaced by its weight of evidence (WOE) [43].We then select 15 variables with the highest information value (IV).The IV of a variable shows its strength as a predictor and is calculated as the weighted sum of the WOE values for all categories of that variable.The weight of each category is determined based on the difference between its frequency among events and non-events [44].We use the logistic regression LASSO method to select a maximum of 10 variables from 15 variables with the highest IV.The coefficients of this model are then scaled and rounded to enable risk score calculation with integer values.Unlike the integerized LR method, the credit scorecard model assigns a score to each category of a variable included in the model allowing users to evaluate how being in each category affects the overall  [36]; number of specialty visits in the last month (allergy, neurology, endocrinology, cardiology); number of emergency room visits, physician visits, hospital admissions, intensive care unit (ICU), cardiac care unit (CCU), and oncology stays in the last one and three months; length of stay in hospital and SNF in the last one and three months [37] Healthcare Spending Medicare and non-Medicare paid costs of inpatient, outpatient, SNF, carrier, DME, home health, hospice claims in the last one and six months [37,38] Most Recent Provider's Quality Metrics Hospital overall rating of the beneficiary's most recent inpatient provider; hospital bed ratio in the beneficiary's county of residence; count of outpatient procedures; emergency room volume [39]; complication rates for hip/knee replacement patients; postoperative complication rates; rate of blood stream infection after surgery; and readmission rates due to heart attack, pneumonia, etc. [29,31] Community-Based Metrics Rural indicator for above 99% of beneficiary's county population living in rural areas, county median household income [40]; state-level flu activity and vaccine coverage, safety score for the most recent county of residence; population statistics about race, education, etc. [29,31] https://doi.org/10.1371/journal.pone.0290215.t001 score.Each method is explained in further detail below after describing the variable categorization.
Variable categorization.Our data consists of binary, continuous, and categorical variables.We discretize continuous variables because the credit scorecard model requires categorical inputs.Furthermore, categorization improves the interpretation of the cost and the number of claim variables whose values are concentrated around zero with only a few large values.We use a binning algorithm [45] to create categories with monotone WOE.For each category of a variable, WOE is calculated as the log of the ratio of its frequency among events to that among non-events.In our case, the event refers to an UTI hospitalization.Thus, a higher WOE for a category implies a higher risk for UTI hospitalization.
Integerized LR method.Let X 1 , X 2 ,. .., X d denote vectors of the considered variables for each beneficiary, and Y be the binary response variable indicating the UTI hospitalization.We assume Y follows a Binomial distribution, which takes the value of one with probability p = Pr (Y = 1|X 1 , X 2 ,. .., X d ), and define the general logistic regression model as where β 0 is the intercept and β 1 ,. .., β d are parameters corresponding to each independent variable.These parameters can have fractional values.In the proposed approach, we scale and round them to integers as below.Define c = |β 0 |྾a as a scaling factor, where a is a positive constant.Scaling the right-hand side of (1) by c yields, We round the scaled parameters in the parentheses to the closest integer value.Hence, the score of each feature is β' k = [βk/c] for k = 0, 1,. .., d, and the approximated log of odds is given by We refer to (3) as integerized LR model.The risk score for patient i is given by scoreðx i;1 ; x i;2 ; . . .; where features are generated by categorizing the original covariates as discussed before.Thus, x i,1 , x i,2 ,. .., x i,d are binary variables.Subsequently, the conversion of a risk score to probability is through the logit function, i.e., p x i;1 ; x i;2 ; . . .; Using the proposed method, it is possible to achieve a specific range of total scores.Denote the min and max total score by S min and S max , respectively.We choose parameter a such that the total score range R = S max − S min is larger than a given value R*.Specifically, we use the following steps: i. Initialize a such that log 10 a ¼ blog 10 ðmax i¼1;...;d ðjb i =b 0 jÞc ) a ¼ 10 blog 10 ðmax i¼1;...;d ðjb i =b 0 jÞc : ð6Þ Here a is defined with base 10 logarithms to capture the order of magnitude difference between the maximum estimated coefficient and the intercept.The floor operator ensures that a is a power of 10 for simplicity of rounding.
ii.While R is smaller than R* set a = a/2.
Step (i) sets the value of a as the order of magnitude that the largest coefficient is larger than the intercept of the logistic regression model.For example, if the intercept is 100 times smaller than the largest coefficient, then a = 100, but if they are in the same order of magnitude, then a = 1.The reason behind this choice is that we want to force the variables selected for the risk score model to have a significant contribution in changing the risk of hospitalization.The coefficients of those features that only marginally change the risk, will hence be zeroed out.Step (ii) decreases the value of a, hence the scaling factor c, to allow smaller coefficients in the model in order to achieve the target minimum total score range.In the computational analysis, the scaling factor c is set to 0.83 based on the results of the LR model.
Credit scorecard model.The credit scorecard model was first developed to assess the risk of defaulting on a debt in the finance literature [43].We begin by applying a data transformation that replaces each categorical value with its WOE.We then select a subset of the variables with the highest IV.After these two steps, a LASSO logistic regression model is trained allowing for a maximum of d variables.The total score in the scorecard model is a linear function of the log of odds, that is S = A + B × log(odds).Using the LR model with WOEs as variable values, we can expand this formula as: where β k is the coefficient of feature k, and δ kj indicates whether the observation is within category j of variable k.We round the score of each category to the nearest integer, that is: Then, the risk score for variable vector (x i,1 , x i,2 ,. .., x i,d ) is given by: scoreðx i;1 ; x i;2 ; . . .; To calculate the score(x i,1 , x i,2 ,. .., x i,d ) for patient i, we first determine the category of each variable, and then use the score of that category.Subsequently, the conversion of the risk score to hospitalization probability is through the following logit function: The values of parameters A and B are determined based on two inputs: (i) the target score S 0 corresponding to a certain odds ratio θ 0 for the outcome (i.e., UTI hospitalization), (ii) points to double the odds ΔS such that S 0 + ΔS corresponds to an odds ratio of 2θ 0 .From these two inputs, we have that S 0 = A + B × log(θ 0 ) and S 0 + ΔS = A + B × log(2θ 0 ).We can solve these equations together to obtain B = ΔS/log (2) and A = S 0 − log(θ 0 ) × ΔS/log (2).Note that A can be set to 0 for multiple choices of S 0 and θ 0 such that S 0 = log(θ 0 ) × B.
As can be seen in Eq (7), the offset parameter A can be used to shift the value of the scores in the positive or negative direction.On the other hand, the scaling parameter B controls the range of the scores assigned to each category.A positive B ensures that higher scores correspond to higher risk for the outcome.In our analysis for the UTI hospitalization, we set the points to double the odds ΔS as 10, and A as 0.

Results
We select a prediction score threshold that maximizes F10 score because UTI hospitalizations are rare in our data.The weight assigned to false negatives (i.e., misclassifying patients with an actual UTI hospitalization) in F10 score is 100 times the weight of false positives (i.e., misclassifying patients without a UTI hospitalization).This reflects the fact that reviewing the case of a patient predicted as high-risk for UTI hospitalization is much more cost-effective than overlooking it and having to deal with a potential hospitalization in the future.
Fig 1 shows the F10 score for each method obtained by using different prediction thresholds over the training set.If the threshold is too low, the model predicts almost all the patients as high risk for hospitalization.In contrast, a large threshold would result in predicting no hospitalization for most of the patients.
As seen in Fig 1, if a lower threshold is chosen, the overall F10 score is fairly robust.Using the thresholds obtained from the F10 analysis, the overall performance of the integerized LR and credit scorecard methods on the test data are summarized in Table 3.
Table 4 presents the scores of the variables selected by the Integerized LR model.A higher score is associated with a higher risk of hospitalization.
Table 5 presents the scores of the variables selected by the credit scorecard model.Note that there is a score assigned to each category within a variable because this model treats different categories as related rather than separate variables.Furthermore, categories are created in such a way that their scores are monotonic.
Based on the selected modeling parameters, the range of the total score for the integerized LR model is [-4,6] while the range for the credit scorecard is [-65, 40].These models can be used to calculate a total risk score for a patient and make a classification for UTI hospitalization (0 or 1) based on a threshold value.We illustrate their usage in Appendix S1 and S2 Tables.
To assess how well the proposed scoring methods differentiate between the patients with and without UTI hospitalization, we examine Fig 2, which illustrates the score distribution within each group for both methods.The cyan and red bars represent patients with and without UTI hospitalization, respectively.The figure illustrates some aspects that relate to AUC and recall.Since integerized LR has higher recall in Table 3, it has more true hospitalizations above the threshold.On the other hand, the scorecard model has better AUC.In Fig 2, this is related to the overlap between the cyan and red bars.The scoring method that exhibits less overlap between these bars performs better in distinguishing the risk of patients with and without UTI hospitalization.We quantify the overlap between the density plots of scores for patients with and without hospitalization based on the ratio of the overlap area to the total area under each plot.
Policymakers can leverage risk score models for individualized interventions, such as phone calls [46], and preventive care programs [47], to reduce the burden of UTI hospitalizations.Population average prescriptive effect (PAPE) and area under the prescriptive effect curve (AUPEC) are two metrics that evaluate the efficiency of individualized treatment rules (ITRs) in comparison to random allocation of treatments [48].Inspired by these metrics, we randomly select a proportion of patients (e.g., 0 to 100%) from the overall data set to allocate an intervention.We repeat this sampling process 100 times and calculate the recall in each replication.For the allocation of intervention based on the risk score, we select the individuals with the highest score to receive the intervention and calculate the recall.That is, the

Discussion
We proposed two risk score models for evaluating the risk of UTI hospitalization.Both models utilize logistic regression, but they pre-process variables and allocate scores in different ways.
We also compared the approach of prioritizing interventions based on predicted risk scores to that of not prioritizing (i.e., random allocation).Both risk score models demonstrate relatively strong prediction performance given the use of administrative claims data on a population where UTI hospitalizations are rare (i.e., 0.71%).The precision in Table 3 is low due to the imbalanced nature of our dataset.Overall, our prediction performance results are similar to the ones reported in the literature using claims data from CMS.Specifically, Mao et al. [18] reported 0.63 AUC for the same outcome using a similar CMS dataset which contains beneficiaries both with and without past UTI diagnosis.They improved this result to 0.72 by clustering beneficiaries and training a model for each cluster.The prediction performance results are also comparable to the performance results for the prediction of pneumonia hospitalization using a similar data set from CMS [49].The integerized LR model performs better on recall while the credit scorecard performs better on AUC.It is

UTI hospitalization risk factors
The proposed risk score models select several common variables, including diagnosis of delirium, dementia, and other cognitive disorders, admission to ICU in the past 3 months, and SNF and inpatient Elixhauser scores.Studies regarding UTI hospitalization trends have found a rising incidence rate over the years [50,51].A study using a nationwide inpatient sample from Healthcare Cost and Utilization Project (HCUP) data set spanning between 1998 and 2011 reported that factors such as age, gender, number of Elixhauser comorbidities and weather temperature have significant correlation with the rate of UTI hospitalization [52].
Another study with patients receiving home health care found female sex, Medicaid recipient, daily living activity dependency, recent treatment for UTI, presence of a urinary catheter, and prior history of indwelling or suprapubic catheter as risk factors for UTI hospitalization [53].
A retrospective cohort study of patients living in nursing facilities found that mobility and walking ability play a significant role in reducing the risk of UTI hospitalization.Furthermore, patient characteristics such as older age, being Hispanic or African-American, along with diabetes mellitus, renal failure, Parkinson's disease, dementia, or stroke were shown to be associated with UTI hospitalization [54].In a study considering patients with complicated UTI, factors such as older age, diabetes, cancer, the history of UTI, and usage of indwelling urinary catheter or percutaneous nephrostomy were identified as risk factors for readmissions for UTI [55].Table 6 presents a list of previous studies analyzing UTI hospitalization risk factors.In our analysis, we have identified several risk factors that are common with the ones found in the literature, such as older age, number of Elixhauser comorbidities, the history of UTI, and cognitive disease.In addition to these factors, other selected variables such as ICU/CCU admissions and inpatient/outpatient costs the beneficiaries were shown to be associated with preventable hospitalizations [56,57].
In the credit scorecard model, the Elixhauser inpatient score has the widest range of points assigned.Patients without any comorbidity receive a low score of -29, while patients with more than six comorbidities receive the highest score of 3.This difference indicates that the odds of UTI hospitalization is estimated to be three times higher for the last category of this variable compared to its first category.For the credit scorecard model, it is worthwhile to note that the variable with the largest positive score (indicating an increased risk of hospitalization), is ICU admission within the past 3 months with a score of 21.For the integerized LR method, the features with the lowest and highest scores are no inpatient comorbidities in the past month with a low score of -3 and ICU/CCU admission within the past 3 months with a score of 2.
Interestingly, the points assigned in both models show that beneficiaries with lower costs exhibit a higher risk for UTI hospitalization.One possible explanation for this result is that patients with higher costs are more closely (and recently) monitored, and their medical conditions are managed more effectively.Lu et al. [57] also found that high-cost patients (total outpatient and inpatient costs) with more outpatient visits are at lower risk of potentially preventable hospitalizations and stated that the utilization of outpatient care may reduce hospitalizations through preventive care or more effective disease management.The integerized LR model only considers low costs in the inpatient care to be an important variable, whereas the credit scorecard model also considers the effect of high-cost values when evaluating the risk of UTI hospitalization.
Overall, the variables selected in the proposed risk score models and the points assigned to them provide valuable insights into the risk of UTI hospitalization.The interpretability of the variables and the transparency of the score generation ensures that the approach is implementable by design.The specific risk score building method to choose may depend on the preferences of the decision-maker about how variables are treated or scored, and about the approach to determining the range and variability of total scores.For example, the integerized LR method can be used to achieve a minimum total score range (e.g., minimum range of 20).In the credit scorecard method, the focus is on the increase in total score with respect to the increase in the risk of hospitalization.Both are reasonable approaches in practice.There are some limitations of this study.We use a 5% sample of Medicare administrative claims data which doesn't include laboratory results or vitals.We predict hospitalization risk at a monthly level, and only conduct analysis for individuals with sufficient Medicare data available.

Conclusion
We introduce two methods to build risk score models for UTI hospitalization utilizing claims data from Medicare limited data sets containing demographics, clinical history, and health care utilization information.We augment this administrative claims data by community-level variables and provider quality metrics obtained from publicly available data sources.We focus on patients with a history of UTI diagnosis.By implementing the integerized LR and credit scorecard models, we assign a risk score to up to 10 important variables associated with UTI hospitalization.Our findings emphasize the effectiveness of risk score models as practical tools for identifying high-risk patients and provide a quantitative assessment of the significance of In future studies, accounting for temporal changes in risk factors may lead to improved prediction results.Additionally, addressing the imbalance in the dataset may require advanced techniques like ensemble learning or neural networks [60,61] as traditional approaches such as oversampling and undersampling did not show improved performance in our experiments.Finally, the proposed methods can be implemented for other patient groups.Then, the baselevel risk of different patient groups can be compared by including the intercept in the total risk score calculation.We focused on beneficiaries with a relatively high risk of UTI hospitalization and presented results for those who had a previous diagnosis of UTI.Therefore, our findings may not be broadly applicable to the entire population of CMS beneficiaries.

Fig 1 .
Fig 1.The F10 score for different prediction score thresholds.Dashed line shows the total score where maximum F10 over the training set is obtained.https://doi.org/10.1371/journal.pone.0290215.g001 possible that the specific choice of prediction thresholds relates to this trade-off.In Fig2, the performance of the credit scorecard model with respect to overlap is slightly better than the integerized LR.This may be due to the differences in the range of score values.As seen in Fig3, both scoring models offer a significant improvement in the percentage of true cases covered when allocating an intervention based on predicted risk, as compared to random allocation.

Fig 2 .
Fig 2. Distribution of the scores for each method.The cyan and red bars represent patients with and without UTI hospitalization, respectively.https://doi.org/10.1371/journal.pone.0290215.g002

Table 3 . Prediction performance of the risk score models and the logistic regression (LR) model.
either allocated to a randomly selected subpopulation or allocated to the same number of individuals with the highest risk scores to show the benefit of using the risk score.Fig 3 illustrates the results of this experiment.The area between the random selection and the credit scorecard model is 0.187, and it is 0.179 for integerized LR.

Table 5 . Risk scores of the variables in the credit scorecard model (the cost values are rounded for brevity).
https://doi.org/10.1371/journal.pone.0290215.t005

Table 6 . Studies regarding UTI hospitalization.
the presence of a urinary catheter, treatment with general antibacterial and antiseptics, dependency in instrumental activities of daily living, and no available caregivers https://doi.org/10.1371/journal.pone.0290215.t006various risk factors in UTI hospitalizations.These factors include: age, number of comorbidities, admission to ICU or CCU in the last 3 months, cognitive disorders, history of bacterial or urinary tract infections, inpatient, outpatient and carrier costs in the last 6 months.